Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems

نویسندگان

William H. Hsu

Michael Welge

Jie Wu

Ting-Hao Yang

چکیده

This paper proposes and surveys genetic implementations of algorithms for selection and partitioning of attributes in large-scale concept learning problems. Algorithms of this type apply relevance determination criteria to attributes from those specified for the original data set. The selected attributes are used to define new data clusters that are used as intermediate training targets. The purpose of this change of representation step is to improve the accuracy of supervised learning using the reformulated data. Domain knowledge about these operators has been shown to reduce the number of fitness evaluations for candidate attributes. This paper examines the genetic encoding of attribute selection and partitioning specifications, and the encoding of domain knowledge about operators in a fitness function. The purpose of this approach is to improve upon existing search-based algorithms (or wrappers) in terms of training sample efficiency. Several GA implementations of alternative (search-based and knowledge-based) attribute synthesis algorithms are surveyed, and their application to large-scale concept learning problems is addressed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetic Algorithms for Reformulation of Large-Scale KDD Problems with Many Irrelevant Attributes

The goal of this research is to apply genetic implementations of algorithms for selection, partitioning, and synthesis of attributes in largescale data mining problems. Domain knowledge about these operators has been shown to reduce the number of fitness evaluations for candidate attributes. We report results on genetic optimization of attribute selection problems and current work on attribute ...

متن کامل

Using Data Mining and Three Decision Tree Algorithms to Optimize the Repair and Maintenance Process

The purpose of this research is to predict the failure of devices using a data mining tool. For this purpose, at the outset, an appropriate database consists of 392 records of ongoing failures in a pharmaceutical company in 1394, in the next step, by analyzing 9 characteristics and type of failure as a database class, analyzes have been used. In this regard, three decision tree algorithms have ...

متن کامل

A Comparative Study between a Pseudo-Forward Equation (PFE) and Intelligence Methods for the Characterization of the North Sea Reservoir

This paper presents a comparative study between three versions of adaptive neuro-fuzzy inference system (ANFIS) algorithms and a pseudo-forward equation (PFE) to characterize the North Sea reservoir (F3 block) based on seismic data. According to the statistical studies, four attributes (energy, envelope, spectral decomposition and similarity) are known to be useful as fundamental attributes in ...

متن کامل

A Comprehensive Study of Several Meta-Heuristic Algorithms for Open-Pit Mine Production Scheduling Problem Considering Grade Uncertainty

It is significant to discover a global optimization in the problems dealing with large dimensional scales to increase the quality of decision-making in the mining operation. It has been broadly confirmed that the long-term production scheduling (LTPS) problem performs a main role in mining projects to develop the performance regarding the obtainability of constraints, while maximizing the whole...

متن کامل

Solving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm

In this paper, we study the re-entrant no-wait flexible flowshop scheduling problem with makespan minimization objective and then consider two parallel machines for each stage. The main characteristic of a re-entrant environment is that at least one job is likely to visit certain stages more than once during the process. The no-wait property describes a situation in which every job has its own ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems

نویسندگان

چکیده

منابع مشابه

Genetic Algorithms for Reformulation of Large-Scale KDD Problems with Many Irrelevant Attributes

Using Data Mining and Three Decision Tree Algorithms to Optimize the Repair and Maintenance Process

A Comparative Study between a Pseudo-Forward Equation (PFE) and Intelligence Methods for the Characterization of the North Sea Reservoir

A Comprehensive Study of Several Meta-Heuristic Algorithms for Open-Pit Mine Production Scheduling Problem Considering Grade Uncertainty

Solving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm

عنوان ژورنال:

اشتراک گذاری